Overview

Dataset statistics

Number of variables20
Number of observations768
Missing cells931
Missing cells (%)6.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory393.8 KiB
Average record size in memory525.0 B

Variable types

Text2
Categorical8
DateTime1
Numeric8
Boolean1

Dataset

DescriptionJHB_DPHRU_013 - Quality-corrected harmonized data
CreatorRP2 Clinical Data Quality Team
AuthorQuality-Checked Data
URLHEAT Research Projects

Variable descriptions

Age (at enrolment)Patient age at study enrollment
CD4 cell count (cells/µL)CD4+ T lymphocyte count (missing codes removed)
HIV viral load (copies/mL)HIV RNA copies per mL (missing codes removed)
BMI (kg/m²)Body Mass Index (extreme values removed)
Waist circumference (cm)Waist circumference (corrected from mm to cm)
ALT (U/L)Alanine aminotransferase (missing codes removed)
Platelet count (×10³/µL)Platelet count (missing codes removed)
Hematocrit (%)Hematocrit (zero values removed)
Lymphocyte count (×10⁹/L)Lymphocyte absolute count (corrected labeling)
Neutrophil count (×10⁹/L)Neutrophil absolute count (corrected labeling)
cd4_correction_appliedQuality flag: CD4 missing codes removed
final_comprehensive_fix_appliedQuality flag: Comprehensive corrections applied
waist_circ_unit_correction_appliedQuality flag: Waist circ unit corrected

Alerts

study_source has constant value "JHB_DPHRU_013"Constant
latitude has constant value "-26.2041"Constant
longitude has constant value "28.0473"Constant
province has constant value "Gauteng"Constant
city has constant value "Johannesburg"Constant
jhb_subregion has constant value "Central_JHB"Constant
cd4_correction_applied has constant value "0.0"Constant
final_comprehensive_fix_applied has constant value "1.0"Constant
BMI (kg/m²) is highly overall correlated with Waist circumference (cm) and 1 other fieldsHigh correlation
Waist circumference (cm) is highly overall correlated with BMI (kg/m²) and 2 other fieldsHigh correlation
hdl_cholesterol_mg_dL is highly overall correlated with total_cholesterol_mg_dLHigh correlation
height_m is highly overall correlated with waist_circ_unit_correction_appliedHigh correlation
ldl_cholesterol_mg_dL is highly overall correlated with total_cholesterol_mg_dLHigh correlation
total_cholesterol_mg_dL is highly overall correlated with hdl_cholesterol_mg_dL and 1 other fieldsHigh correlation
waist_circ_unit_correction_applied is highly overall correlated with Waist circumference (cm) and 2 other fieldsHigh correlation
weight_kg is highly overall correlated with BMI (kg/m²) and 2 other fieldsHigh correlation
Age (at enrolment) has 14 (1.8%) missing valuesMissing
Waist circumference (cm) has 205 (26.7%) missing valuesMissing
weight_kg has 205 (26.7%) missing valuesMissing
height_m has 331 (43.1%) missing valuesMissing
hdl_cholesterol_mg_dL has 58 (7.6%) missing valuesMissing
ldl_cholesterol_mg_dL has 58 (7.6%) missing valuesMissing
total_cholesterol_mg_dL has 59 (7.7%) missing valuesMissing

Reproduction

Analysis started2025-11-24 21:49:08.775510
Analysis finished2025-11-24 21:49:11.292008
Duration2.52 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

Distinct247
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Memory size55.5 KiB
2025-11-24T23:49:11.330782image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters13056
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)2.9%

Sample

1st rowHEAT_0D62D894DA8A
2nd rowHEAT_4D7DE69432FE
3rd rowHEAT_4D7DE69432FE
4th rowHEAT_4D7DE69432FE
5th rowHEAT_4D7DE69432FE
ValueCountFrequency (%)
heat_2daa2908c08e4
 
0.5%
heat_f69702d4909e4
 
0.5%
heat_b9734125f4b44
 
0.5%
heat_79baa4e04a764
 
0.5%
heat_4be516edeae74
 
0.5%
heat_72b672d3a1824
 
0.5%
heat_c79d795903e34
 
0.5%
heat_ba33fc2535134
 
0.5%
heat_dc20966e386c4
 
0.5%
heat_c43b7f2754f14
 
0.5%
Other values (237)728
94.8%
2025-11-24T23:49:11.435913image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E1426
 
10.9%
A1310
 
10.0%
H768
 
5.9%
T768
 
5.9%
_768
 
5.9%
0629
 
4.8%
3620
 
4.7%
F591
 
4.5%
6585
 
4.5%
1585
 
4.5%
Other values (9)5006
38.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6551
50.2%
Decimal Number5737
43.9%
Connector Punctuation768
 
5.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0629
11.0%
3620
10.8%
6585
10.2%
1585
10.2%
9584
10.2%
7573
10.0%
2572
10.0%
5570
9.9%
8525
9.2%
4494
8.6%
Uppercase Letter
ValueCountFrequency (%)
E1426
21.8%
A1310
20.0%
H768
11.7%
T768
11.7%
F591
9.0%
D581
8.9%
B554
 
8.5%
C553
 
8.4%
Connector Punctuation
ValueCountFrequency (%)
_768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6551
50.2%
Common6505
49.8%

Most frequent character per script

Common
ValueCountFrequency (%)
_768
11.8%
0629
9.7%
3620
9.5%
6585
9.0%
1585
9.0%
9584
9.0%
7573
8.8%
2572
8.8%
5570
8.8%
8525
8.1%
Latin
ValueCountFrequency (%)
E1426
21.8%
A1310
20.0%
H768
11.7%
T768
11.7%
F591
9.0%
D581
8.9%
B554
 
8.5%
C553
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII13056
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E1426
 
10.9%
A1310
 
10.0%
H768
 
5.9%
T768
 
5.9%
_768
 
5.9%
0629
 
4.8%
3620
 
4.7%
F591
 
4.5%
6585
 
4.5%
1585
 
4.5%
Other values (9)5006
38.3%
Distinct247
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Memory size48.0 KiB
2025-11-24T23:49:11.534938image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters5376
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)2.9%

Sample

1st rowWBS 001
2nd rowWBS 003
3rd rowWBS 003
4th rowWBS 003
5th rowWBS 003
ValueCountFrequency (%)
wbs768
50.0%
0174
 
0.3%
0204
 
0.3%
3044
 
0.3%
0224
 
0.3%
0234
 
0.3%
0274
 
0.3%
0284
 
0.3%
3014
 
0.3%
0334
 
0.3%
Other values (238)732
47.7%
2025-11-24T23:49:11.673749image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
W768
14.3%
B768
14.3%
S768
14.3%
768
14.3%
2417
7.8%
0415
7.7%
1410
7.6%
3214
 
4.0%
5150
 
2.8%
6143
 
2.7%
Other values (4)555
10.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2304
42.9%
Decimal Number2304
42.9%
Space Separator768
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2417
18.1%
0415
18.0%
1410
17.8%
3214
9.3%
5150
 
6.5%
6143
 
6.2%
7143
 
6.2%
8141
 
6.1%
9139
 
6.0%
4132
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
W768
33.3%
B768
33.3%
S768
33.3%
Space Separator
ValueCountFrequency (%)
768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3072
57.1%
Latin2304
42.9%

Most frequent character per script

Common
ValueCountFrequency (%)
768
25.0%
2417
13.6%
0415
13.5%
1410
13.3%
3214
 
7.0%
5150
 
4.9%
6143
 
4.7%
7143
 
4.7%
8141
 
4.6%
9139
 
4.5%
Latin
ValueCountFrequency (%)
W768
33.3%
B768
33.3%
S768
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W768
14.3%
B768
14.3%
S768
14.3%
768
14.3%
2417
7.8%
0415
7.7%
1410
7.6%
3214
 
4.0%
5150
 
2.8%
6143
 
2.7%
Other values (4)555
10.3%

study_source
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size52.5 KiB
JHB_DPHRU_013
768 

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters9984
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_DPHRU_013
2nd rowJHB_DPHRU_013
3rd rowJHB_DPHRU_013
4th rowJHB_DPHRU_013
5th rowJHB_DPHRU_013

Common Values

ValueCountFrequency (%)
JHB_DPHRU_013768
100.0%

Length

2025-11-24T23:49:11.726098image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:11.757585image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
jhb_dphru_013768
100.0%

Most occurring characters

ValueCountFrequency (%)
H1536
15.4%
_1536
15.4%
J768
7.7%
B768
7.7%
D768
7.7%
P768
7.7%
R768
7.7%
U768
7.7%
0768
7.7%
1768
7.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6144
61.5%
Decimal Number2304
 
23.1%
Connector Punctuation1536
 
15.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H1536
25.0%
J768
12.5%
B768
12.5%
D768
12.5%
P768
12.5%
R768
12.5%
U768
12.5%
Decimal Number
ValueCountFrequency (%)
0768
33.3%
1768
33.3%
3768
33.3%
Connector Punctuation
ValueCountFrequency (%)
_1536
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6144
61.5%
Common3840
38.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
H1536
25.0%
J768
12.5%
B768
12.5%
D768
12.5%
P768
12.5%
R768
12.5%
U768
12.5%
Common
ValueCountFrequency (%)
_1536
40.0%
0768
20.0%
1768
20.0%
3768
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H1536
15.4%
_1536
15.4%
J768
7.7%
B768
7.7%
D768
7.7%
P768
7.7%
R768
7.7%
U768
7.7%
0768
7.7%
1768
7.7%
Distinct232
Distinct (%)30.2%
Missing0
Missing (%)0.0%
Memory size12.0 KiB
Minimum2011-02-10 00:00:00
Maximum2013-06-19 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-24T23:49:11.795635image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:11.845421image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Age (at enrolment)
Real number (ℝ)

Missing 

Patient age at study enrollment

Distinct183
Distinct (%)24.3%
Missing14
Missing (%)1.8%
Infinite0
Infinite (%)0.0%
Mean33.533554
Minimum18.1
Maximum51
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:11.895654image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum18.1
5-th percentile22
Q127.85
median33.95
Q339
95-th percentile46
Maximum51
Range32.9
Interquartile range (IQR)11.15

Descriptive statistics

Standard deviation7.3527855
Coefficient of variation (CV)0.21926651
Kurtosis-0.80768914
Mean33.533554
Median Absolute Deviation (MAD)5.95
Skewness0.055500259
Sum25284.3
Variance54.063454
MonotonicityNot monotonic
2025-11-24T23:49:11.944266image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4032
 
4.2%
3431
 
4.0%
3931
 
4.0%
3729
 
3.8%
3027
 
3.5%
3125
 
3.3%
3823
 
3.0%
4122
 
2.9%
2621
 
2.7%
3521
 
2.7%
Other values (173)492
64.1%
ValueCountFrequency (%)
18.11
 
0.1%
18.81
 
0.1%
193
 
0.4%
19.31
 
0.1%
19.42
 
0.3%
19.51
 
0.1%
19.61
 
0.1%
209
1.2%
20.11
 
0.1%
20.61
 
0.1%
ValueCountFrequency (%)
511
 
0.1%
503
 
0.4%
49.11
 
0.1%
495
0.7%
488
1.0%
47.91
 
0.1%
47.21
 
0.1%
4710
1.3%
46.61
 
0.1%
46.41
 
0.1%

latitude
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size48.8 KiB
-26.2041
768 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters6144
Distinct characters7
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-26.2041
2nd row-26.2041
3rd row-26.2041
4th row-26.2041
5th row-26.2041

Common Values

ValueCountFrequency (%)
-26.2041768
100.0%

Length

2025-11-24T23:49:11.990644image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:12.024585image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
26.2041768
100.0%

Most occurring characters

ValueCountFrequency (%)
21536
25.0%
-768
12.5%
6768
12.5%
.768
12.5%
0768
12.5%
4768
12.5%
1768
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4608
75.0%
Dash Punctuation768
 
12.5%
Other Punctuation768
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21536
33.3%
6768
16.7%
0768
16.7%
4768
16.7%
1768
16.7%
Dash Punctuation
ValueCountFrequency (%)
-768
100.0%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6144
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21536
25.0%
-768
12.5%
6768
12.5%
.768
12.5%
0768
12.5%
4768
12.5%
1768
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII6144
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21536
25.0%
-768
12.5%
6768
12.5%
.768
12.5%
0768
12.5%
4768
12.5%
1768
12.5%

longitude
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size48.0 KiB
28.0473
768 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters5376
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row28.0473
2nd row28.0473
3rd row28.0473
4th row28.0473
5th row28.0473

Common Values

ValueCountFrequency (%)
28.0473768
100.0%

Length

2025-11-24T23:49:12.059697image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:12.092910image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
28.0473768
100.0%

Most occurring characters

ValueCountFrequency (%)
2768
14.3%
8768
14.3%
.768
14.3%
0768
14.3%
4768
14.3%
7768
14.3%
3768
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4608
85.7%
Other Punctuation768
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2768
16.7%
8768
16.7%
0768
16.7%
4768
16.7%
7768
16.7%
3768
16.7%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5376
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2768
14.3%
8768
14.3%
.768
14.3%
0768
14.3%
4768
14.3%
7768
14.3%
3768
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2768
14.3%
8768
14.3%
.768
14.3%
0768
14.3%
4768
14.3%
7768
14.3%
3768
14.3%

province
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size48.0 KiB
Gauteng
768 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters5376
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGauteng
2nd rowGauteng
3rd rowGauteng
4th rowGauteng
5th rowGauteng

Common Values

ValueCountFrequency (%)
Gauteng768
100.0%

Length

2025-11-24T23:49:12.127155image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:12.160195image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gauteng768
100.0%

Most occurring characters

ValueCountFrequency (%)
G768
14.3%
a768
14.3%
u768
14.3%
t768
14.3%
e768
14.3%
n768
14.3%
g768
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4608
85.7%
Uppercase Letter768
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a768
16.7%
u768
16.7%
t768
16.7%
e768
16.7%
n768
16.7%
g768
16.7%
Uppercase Letter
ValueCountFrequency (%)
G768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5376
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G768
14.3%
a768
14.3%
u768
14.3%
t768
14.3%
e768
14.3%
n768
14.3%
g768
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G768
14.3%
a768
14.3%
u768
14.3%
t768
14.3%
e768
14.3%
n768
14.3%
g768
14.3%

city
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size51.8 KiB
Johannesburg
768 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters9216
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJohannesburg
2nd rowJohannesburg
3rd rowJohannesburg
4th rowJohannesburg
5th rowJohannesburg

Common Values

ValueCountFrequency (%)
Johannesburg768
100.0%

Length

2025-11-24T23:49:12.192123image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:12.223768image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
johannesburg768
100.0%

Most occurring characters

ValueCountFrequency (%)
n1536
16.7%
J768
8.3%
o768
8.3%
h768
8.3%
a768
8.3%
e768
8.3%
s768
8.3%
b768
8.3%
u768
8.3%
r768
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8448
91.7%
Uppercase Letter768
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n1536
18.2%
o768
9.1%
h768
9.1%
a768
9.1%
e768
9.1%
s768
9.1%
b768
9.1%
u768
9.1%
r768
9.1%
g768
9.1%
Uppercase Letter
ValueCountFrequency (%)
J768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9216
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n1536
16.7%
J768
8.3%
o768
8.3%
h768
8.3%
a768
8.3%
e768
8.3%
s768
8.3%
b768
8.3%
u768
8.3%
r768
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9216
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n1536
16.7%
J768
8.3%
o768
8.3%
h768
8.3%
a768
8.3%
e768
8.3%
s768
8.3%
b768
8.3%
u768
8.3%
r768
8.3%

jhb_subregion
Categorical

Constant 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size51.0 KiB
Central_JHB
768 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters8448
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCentral_JHB
2nd rowCentral_JHB
3rd rowCentral_JHB
4th rowCentral_JHB
5th rowCentral_JHB

Common Values

ValueCountFrequency (%)
Central_JHB768
100.0%

Length

2025-11-24T23:49:12.256869image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:12.287896image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
central_jhb768
100.0%

Most occurring characters

ValueCountFrequency (%)
C768
9.1%
e768
9.1%
n768
9.1%
t768
9.1%
r768
9.1%
a768
9.1%
l768
9.1%
_768
9.1%
J768
9.1%
H768
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4608
54.5%
Uppercase Letter3072
36.4%
Connector Punctuation768
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e768
16.7%
n768
16.7%
t768
16.7%
r768
16.7%
a768
16.7%
l768
16.7%
Uppercase Letter
ValueCountFrequency (%)
C768
25.0%
J768
25.0%
H768
25.0%
B768
25.0%
Connector Punctuation
ValueCountFrequency (%)
_768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7680
90.9%
Common768
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C768
10.0%
e768
10.0%
n768
10.0%
t768
10.0%
r768
10.0%
a768
10.0%
l768
10.0%
J768
10.0%
H768
10.0%
B768
10.0%
Common
ValueCountFrequency (%)
_768
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8448
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C768
9.1%
e768
9.1%
n768
9.1%
t768
9.1%
r768
9.1%
a768
9.1%
l768
9.1%
_768
9.1%
J768
9.1%
H768
9.1%

BMI (kg/m²)
Real number (ℝ)

High correlation 

Body Mass Index (extreme values removed)

Distinct248
Distinct (%)32.3%
Missing1
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean27.852803
Minimum15.1
Maximum57
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.324919image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum15.1
5-th percentile19.13
Q123
median26.7
Q331.5
95-th percentile40.54
Maximum57
Range41.9
Interquartile range (IQR)8.5

Descriptive statistics

Standard deviation6.6900116
Coefficient of variation (CV)0.24019168
Kurtosis1.6551874
Mean27.852803
Median Absolute Deviation (MAD)4.1
Skewness1.0682353
Sum21363.1
Variance44.756255
MonotonicityNot monotonic
2025-11-24T23:49:12.366947image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26.311
 
1.4%
2510
 
1.3%
21.810
 
1.3%
26.79
 
1.2%
27.48
 
1.0%
25.88
 
1.0%
32.38
 
1.0%
21.58
 
1.0%
22.98
 
1.0%
28.97
 
0.9%
Other values (238)680
88.5%
ValueCountFrequency (%)
15.11
0.1%
15.31
0.1%
161
0.1%
16.11
0.1%
16.61
0.1%
16.81
0.1%
16.91
0.1%
17.11
0.1%
17.21
0.1%
17.31
0.1%
ValueCountFrequency (%)
571
 
0.1%
56.11
 
0.1%
54.91
 
0.1%
54.31
 
0.1%
50.71
 
0.1%
50.41
 
0.1%
50.11
 
0.1%
49.83
0.4%
491
 
0.1%
46.42
0.3%

Waist circumference (cm)
Real number (ℝ)

High correlation  Missing 

Waist circumference (corrected from mm to cm)

Distinct115
Distinct (%)20.4%
Missing205
Missing (%)26.7%
Infinite0
Infinite (%)0.0%
Mean89.362345
Minimum2.9
Maximum915
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.411474image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2.9
5-th percentile67.55
Q178
median86.5
Q396.5
95-th percentile115.25
Maximum915
Range912.1
Interquartile range (IQR)18.5

Descriptive statistics

Standard deviation38.256862
Coefficient of variation (CV)0.42810942
Kurtosis387.32681
Mean89.362345
Median Absolute Deviation (MAD)8.5
Skewness17.935657
Sum50311
Variance1463.5875
MonotonicityNot monotonic
2025-11-24T23:49:12.457466image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8722
 
2.9%
8519
 
2.5%
8119
 
2.5%
7818
 
2.3%
8918
 
2.3%
8617
 
2.2%
7916
 
2.1%
7416
 
2.1%
7615
 
2.0%
7713
 
1.7%
Other values (105)390
50.8%
(Missing)205
26.7%
ValueCountFrequency (%)
2.91
 
0.1%
8.11
 
0.1%
10.81
 
0.1%
591
 
0.1%
612
0.3%
621
 
0.1%
634
0.5%
641
 
0.1%
652
0.3%
664
0.5%
ValueCountFrequency (%)
9151
0.1%
1511
0.1%
1451
0.1%
143.51
0.1%
1401
0.1%
1331
0.1%
1311
0.1%
1301
0.1%
129.51
0.1%
1282
0.3%

weight_kg
Real number (ℝ)

High correlation  Missing 

Distinct360
Distinct (%)63.9%
Missing205
Missing (%)26.7%
Infinite0
Infinite (%)0.0%
Mean69.787744
Minimum35.1
Maximum140.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.503527image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum35.1
5-th percentile47.61
Q157.9
median67.2
Q378.4
95-th percentile102.99
Maximum140.5
Range105.4
Interquartile range (IQR)20.5

Descriptive statistics

Standard deviation16.938157
Coefficient of variation (CV)0.24270962
Kurtosis1.3539238
Mean69.787744
Median Absolute Deviation (MAD)10
Skewness0.98611018
Sum39290.5
Variance286.90115
MonotonicityNot monotonic
2025-11-24T23:49:12.635239image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
72.35
 
0.7%
65.45
 
0.7%
59.64
 
0.5%
544
 
0.5%
53.74
 
0.5%
76.64
 
0.5%
65.64
 
0.5%
61.84
 
0.5%
55.14
 
0.5%
69.44
 
0.5%
Other values (350)521
67.8%
(Missing)205
 
26.7%
ValueCountFrequency (%)
35.11
0.1%
35.81
0.1%
36.41
0.1%
39.81
0.1%
41.61
0.1%
41.81
0.1%
421
0.1%
42.12
0.3%
42.51
0.1%
43.61
0.1%
ValueCountFrequency (%)
140.51
0.1%
135.21
0.1%
133.81
0.1%
130.61
0.1%
129.11
0.1%
121.91
0.1%
1181
0.1%
116.31
0.1%
115.81
0.1%
114.71
0.1%

height_m
Real number (ℝ)

High correlation  Missing 

Distinct194
Distinct (%)44.4%
Missing331
Missing (%)43.1%
Infinite0
Infinite (%)0.0%
Mean1.5870046
Minimum1.39
Maximum1.785
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.682671image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1.39
5-th percentile1.4918
Q11.552
median1.589
Q31.619
95-th percentile1.6772
Maximum1.785
Range0.395
Interquartile range (IQR)0.067

Descriptive statistics

Standard deviation0.057610126
Coefficient of variation (CV)0.036301172
Kurtosis1.2076171
Mean1.5870046
Median Absolute Deviation (MAD)0.034
Skewness-0.018390588
Sum693.521
Variance0.0033189266
MonotonicityNot monotonic
2025-11-24T23:49:12.734199image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.5847
 
0.9%
1.5917
 
0.9%
1.5887
 
0.9%
1.617
 
0.9%
1.6066
 
0.8%
1.5686
 
0.8%
1.66
 
0.8%
1.5956
 
0.8%
1.5986
 
0.8%
1.5855
 
0.7%
Other values (184)374
48.7%
(Missing)331
43.1%
ValueCountFrequency (%)
1.391
0.1%
1.4041
0.1%
1.4051
0.1%
1.4061
0.1%
1.4161
0.1%
1.4171
0.1%
1.4571
0.1%
1.461
0.1%
1.4661
0.1%
1.4671
0.1%
ValueCountFrequency (%)
1.7851
0.1%
1.781
0.1%
1.7621
0.1%
1.7591
0.1%
1.7571
0.1%
1.7331
0.1%
1.7171
0.1%
1.7151
0.1%
1.711
0.1%
1.7081
0.1%

hdl_cholesterol_mg_dL
Real number (ℝ)

High correlation  Missing 

Distinct174
Distinct (%)24.5%
Missing58
Missing (%)7.6%
Infinite0
Infinite (%)0.0%
Mean1.1211127
Minimum0.28
Maximum3.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.781155image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0.28
5-th percentile0.51
Q10.83
median1.07
Q31.37
95-th percentile1.8855
Maximum3.7
Range3.42
Interquartile range (IQR)0.54

Descriptive statistics

Standard deviation0.44352229
Coefficient of variation (CV)0.39560902
Kurtosis4.4712255
Mean1.1211127
Median Absolute Deviation (MAD)0.26
Skewness1.2913394
Sum795.99
Variance0.19671202
MonotonicityNot monotonic
2025-11-24T23:49:12.828469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.813
 
1.7%
1.0413
 
1.7%
0.8513
 
1.7%
0.9313
 
1.7%
1.113
 
1.7%
1.1811
 
1.4%
0.9511
 
1.4%
110
 
1.3%
0.8410
 
1.3%
0.879
 
1.2%
Other values (164)594
77.3%
(Missing)58
 
7.6%
ValueCountFrequency (%)
0.281
0.1%
0.321
0.1%
0.332
0.3%
0.342
0.3%
0.351
0.1%
0.362
0.3%
0.372
0.3%
0.391
0.1%
0.42
0.3%
0.412
0.3%
ValueCountFrequency (%)
3.73
0.4%
2.81
 
0.1%
2.532
0.3%
2.491
 
0.1%
2.441
 
0.1%
2.311
 
0.1%
2.31
 
0.1%
2.291
 
0.1%
2.242
0.3%
2.231
 
0.1%

ldl_cholesterol_mg_dL
Real number (ℝ)

High correlation  Missing 

Distinct261
Distinct (%)36.8%
Missing58
Missing (%)7.6%
Infinite0
Infinite (%)0.0%
Mean1.6717042
Minimum0
Maximum6.04
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.874956image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.6745
Q11.11
median1.535
Q32.07
95-th percentile3.18
Maximum6.04
Range6.04
Interquartile range (IQR)0.96

Descriptive statistics

Standard deviation0.77008108
Coefficient of variation (CV)0.4606563
Kurtosis1.8978142
Mean1.6717042
Median Absolute Deviation (MAD)0.475
Skewness1.0866871
Sum1186.91
Variance0.59302488
MonotonicityNot monotonic
2025-11-24T23:49:12.920698image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.019
 
1.2%
1.129
 
1.2%
1.329
 
1.2%
1.378
 
1.0%
1.298
 
1.0%
1.187
 
0.9%
2.067
 
0.9%
1.947
 
0.9%
1.267
 
0.9%
1.767
 
0.9%
Other values (251)632
82.3%
(Missing)58
 
7.6%
ValueCountFrequency (%)
01
 
0.1%
0.331
 
0.1%
0.391
 
0.1%
0.422
 
0.3%
0.451
 
0.1%
0.461
 
0.1%
0.471
 
0.1%
0.53
0.4%
0.555
0.7%
0.564
0.5%
ValueCountFrequency (%)
6.041
0.1%
4.411
0.1%
4.281
0.1%
4.252
0.3%
4.191
0.1%
4.131
0.1%
3.971
0.1%
3.941
0.1%
3.891
0.1%
3.871
0.1%

total_cholesterol_mg_dL
Real number (ℝ)

High correlation  Missing 

Distinct331
Distinct (%)46.7%
Missing59
Missing (%)7.7%
Infinite0
Infinite (%)0.0%
Mean4.1249083
Minimum1.12
Maximum10.48
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2025-11-24T23:49:12.967385image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1.12
5-th percentile2.538
Q13.39
median4
Q34.81
95-th percentile6.016
Maximum10.48
Range9.36
Interquartile range (IQR)1.42

Descriptive statistics

Standard deviation1.1613229
Coefficient of variation (CV)0.28153907
Kurtosis3.3551237
Mean4.1249083
Median Absolute Deviation (MAD)0.7
Skewness1.0133115
Sum2924.56
Variance1.3486708
MonotonicityNot monotonic
2025-11-24T23:49:13.014148image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
410
 
1.3%
4.119
 
1.2%
3.627
 
0.9%
4.576
 
0.8%
3.896
 
0.8%
3.486
 
0.8%
4.936
 
0.8%
3.686
 
0.8%
2.775
 
0.7%
3.525
 
0.7%
Other values (321)643
83.7%
(Missing)59
 
7.7%
ValueCountFrequency (%)
1.121
0.1%
1.221
0.1%
1.291
0.1%
1.381
0.1%
1.541
0.1%
1.591
0.1%
1.82
0.3%
1.851
0.1%
2.012
0.3%
2.061
0.1%
ValueCountFrequency (%)
10.481
0.1%
10.291
0.1%
9.282
0.3%
9.041
0.1%
8.651
0.1%
7.71
0.1%
7.591
0.1%
7.31
0.1%
7.281
0.1%
6.821
0.1%

cd4_correction_applied
Categorical

Constant 

Quality flag: CD4 missing codes removed

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size45.0 KiB
0.0
768 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2304
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0768
100.0%

Length

2025-11-24T23:49:13.056230image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:13.089379image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0768
100.0%

Most occurring characters

ValueCountFrequency (%)
01536
66.7%
.768
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1536
66.7%
Other Punctuation768
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01536
100.0%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2304
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01536
66.7%
.768
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2304
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01536
66.7%
.768
33.3%

final_comprehensive_fix_applied
Categorical

Constant 

Quality flag: Comprehensive corrections applied

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size45.0 KiB
1.0
768 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2304
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0768
100.0%

Length

2025-11-24T23:49:13.123287image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:49:13.157467image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0768
100.0%

Most occurring characters

ValueCountFrequency (%)
1768
33.3%
.768
33.3%
0768
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1536
66.7%
Other Punctuation768
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1768
50.0%
0768
50.0%
Other Punctuation
ValueCountFrequency (%)
.768
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2304
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1768
33.3%
.768
33.3%
0768
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2304
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1768
33.3%
.768
33.3%
0768
33.3%

waist_circ_unit_correction_applied
Boolean

High correlation 

Quality flag: Waist circ unit corrected

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
True
563 
False
205 
ValueCountFrequency (%)
True563
73.3%
False205
 
26.7%
2025-11-24T23:49:13.199376image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2025-11-24T23:49:10.811712image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:08.907374image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.210223image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.452188image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.700361image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.951068image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.212612image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.471254image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.844388image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:08.960722image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.242215image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.484125image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.731859image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.985137image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.245902image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.586563image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.872433image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:08.992406image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.269679image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.511849image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.761225image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.015090image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.276184image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.616424image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.900398image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.038281image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.297930image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.540204image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.789984image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.047014image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.306658image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.647972image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.930154image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.079951image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.327297image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.570790image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.821242image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.079460image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.338572image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.678971image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.963180image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.114805image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.359385image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.604584image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.855344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.115920image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.373638image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.714409image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.997263image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.147694image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.390403image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.636995image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.888495image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.148736image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.406790image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.747518image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:11.032474image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.179380image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.420974image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.670548image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:09.921372image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.180953image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.440158image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:49:10.780990image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-11-24T23:49:13.225694image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Age (at enrolment)BMI (kg/m²)Waist circumference (cm)hdl_cholesterol_mg_dLheight_mldl_cholesterol_mg_dLtotal_cholesterol_mg_dLwaist_circ_unit_correction_appliedweight_kg
Age (at enrolment)1.0000.2290.2850.012-0.0370.1570.1550.0000.218
BMI (kg/m²)0.2291.0000.8950.020-0.1120.1060.0960.0000.937
Waist circumference (cm)0.2850.8951.0000.0040.0470.0910.1111.0000.896
hdl_cholesterol_mg_dL0.0120.0200.0041.000-0.0210.2690.5070.1020.021
height_m-0.037-0.1120.047-0.0211.000-0.081-0.0391.0000.185
ldl_cholesterol_mg_dL0.1570.1060.0910.269-0.0811.0000.5660.0940.093
total_cholesterol_mg_dL0.1550.0960.1110.507-0.0390.5661.0000.1250.084
waist_circ_unit_correction_applied0.0000.0001.0000.1021.0000.0940.1251.0001.000
weight_kg0.2180.9370.8960.0210.1850.0930.0841.0001.000

Missing values

2025-11-24T23:49:11.082097image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-24T23:49:11.180689image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-11-24T23:49:11.250900image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)latitudelongitudeprovincecityjhb_subregionBMI (kg/m²)Waist circumference (cm)weight_kgheight_mhdl_cholesterol_mg_dLldl_cholesterol_mg_dLtotal_cholesterol_mg_dLcd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
217HEAT_0D62D894DA8AWBS 001JHB_DPHRU_0132011-02-1019.4-26.204128.0473GautengJohannesburgCentral_JHB24.283.059.81.5841.231.412.770.01.0True
218HEAT_4D7DE69432FEWBS 003JHB_DPHRU_0132011-04-0939.4-26.204128.0473GautengJohannesburgCentral_JHB33.6103.083.91.5890.901.544.930.01.0True
219HEAT_4D7DE69432FEWBS 003JHB_DPHRU_0132012-01-21NaN-26.204128.0473GautengJohannesburgCentral_JHB33.1NaNNaNNaN1.332.205.110.01.0False
220HEAT_4D7DE69432FEWBS 003JHB_DPHRU_0132012-04-0240.0-26.204128.0473GautengJohannesburgCentral_JHB33.5102.084.71.5981.612.375.350.01.0True
221HEAT_4D7DE69432FEWBS 003JHB_DPHRU_0132013-05-1642.0-26.204128.0473GautengJohannesburgCentral_JHB30.189.076.0NaN1.713.365.890.01.0True
222HEAT_1B67A8E196F0WBS 004JHB_DPHRU_0132011-03-1939.0-26.204128.0473GautengJohannesburgCentral_JHB22.077.068.01.7621.163.033.970.01.0True
223HEAT_1B67A8E196F0WBS 004JHB_DPHRU_0132011-08-2740.0-26.204128.0473GautengJohannesburgCentral_JHB21.5NaNNaNNaN0.522.482.520.01.0False
224HEAT_1B67A8E196F0WBS 004JHB_DPHRU_0132012-02-0940.0-26.204128.0473GautengJohannesburgCentral_JHB21.277.065.01.7590.952.714.170.01.0True
225HEAT_1B67A8E196F0WBS 004JHB_DPHRU_0132013-05-0941.0-26.204128.0473GautengJohannesburgCentral_JHB21.677.066.1NaN1.042.604.470.01.0True
226HEAT_42F4CBECF58FWBS 005JHB_DPHRU_0132011-03-1722.2-26.204128.0473GautengJohannesburgCentral_JHB19.369.051.81.6460.902.173.130.01.0True
anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)latitudelongitudeprovincecityjhb_subregionBMI (kg/m²)Waist circumference (cm)weight_kgheight_mhdl_cholesterol_mg_dLldl_cholesterol_mg_dLtotal_cholesterol_mg_dLcd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
975HEAT_0B3BD856B19EWBS 311JHB_DPHRU_0132011-06-1125.9-26.204128.0473GautengJohannesburgCentral_JHB21.568.050.41.5381.141.574.960.01.0True
976HEAT_0B3BD856B19EWBS 311JHB_DPHRU_0132012-01-21NaN-26.204128.0473GautengJohannesburgCentral_JHB22.6NaNNaNNaN1.322.196.240.01.0False
977HEAT_0B3BD856B19EWBS 311JHB_DPHRU_0132012-05-1227.0-26.204128.0473GautengJohannesburgCentral_JHB23.672.054.61.5251.962.686.820.01.0True
978HEAT_05B2FB1A51B9WBS 312JHB_DPHRU_0132011-06-1133.3-26.204128.0473GautengJohannesburgCentral_JHB32.497.077.81.5541.101.053.700.01.0True
979HEAT_05B2FB1A51B9WBS 312JHB_DPHRU_0132011-11-1634.0-26.204128.0473GautengJohannesburgCentral_JHB34.4NaNNaNNaN1.401.725.320.01.0False
980HEAT_05B2FB1A51B9WBS 312JHB_DPHRU_0132012-05-0234.0-26.204128.0473GautengJohannesburgCentral_JHB37.3115.590.81.5621.761.324.110.01.0True
981HEAT_05B2FB1A51B9WBS 312JHB_DPHRU_0132013-05-0835.0-26.204128.0473GautengJohannesburgCentral_JHB37.9103.091.1NaN0.421.352.350.01.0True
982HEAT_59DF578C032DWBS 313JHB_DPHRU_0132011-06-0731.3-26.204128.0473GautengJohannesburgCentral_JHB31.8101.084.61.6300.911.003.520.01.0True
983HEAT_59DF578C032DWBS 313JHB_DPHRU_0132011-11-1032.0-26.204128.0473GautengJohannesburgCentral_JHB31.2NaNNaNNaN1.020.592.930.01.0False
984HEAT_59DF578C032DWBS 313JHB_DPHRU_0132012-05-0232.0-26.204128.0473GautengJohannesburgCentral_JHB33.2104.087.21.627NaNNaNNaN0.01.0True